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Abstract 

A technique introduced by Indyk and Woodruff [STOC 2005] has inspired several recent 
advances in data-stream algorithms. We show that a number of these results follow easily from 
the application of a single probabilistic method called Precision Sampling. Using this method, 
we obtain simple data-stream algorithms that maintain a randomized sketch of an input vector 
x = (il'i, . . . x n ), which is useful for the following applications: 

• Estimating the ffc-moment of x, for k > 2. 

• Estimating the £ p -norm of x, for p £ [1,2], with small update time. 

• Estimating cascaded norms £ P (i q ) for all p, q > 0. 

• i\ sampling, where the goal is to produce an element i with probability (approximately) 
|xj|/||x||i. It extends to similarly defined € p -sampling, for p <= [1,2]. 

For all these applications the algorithm is essentially the same: scale the vector x entry-wise 
by a well-chosen random vector, and run a heavy-hitter estimation algorithm on the resulting 
vector. Our sketch is a linear function of x, thereby allowing general updates to the vector x. 

Precision Sampling itself addresses the problem of estimating a sum Y17=i Qi f rom weak 
estimates of each real Oj € [0, 1]. More precisely, the estimator first chooses a desired precision 
Ui G (0, 1] for each i G [n], and then it receives an estimate of every a within additive Ui. Its 
goal is to provide a good approximation to ^ ai while keeping a tab on the "approximation 
cost" Here we refine previous work [Andoni, Krauthgamer, and Onak, FOCS 2010] 

which shows that as long as ^ ai = 0(1), a good multiplicative approximation can be achieved 
using total precision of only O(nlogn). 
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1 Introduction 



A number of recent developments in algorithms for data streams have been inspired, at least in 
part, by a technique devised by Indyk and Woodruff [ IW05| ] to obtain near-optimal space bounds 
for estimating Fk moments, for k > 2. Indeed, refinements and modifications of that technique 
were used for designing better or new algorithms for applications such as: moments fBGKS06| 
(with better bounds than [ IW05| ] ) , entropy estimation [ BG06||, cascaded norms [GBDOS, JW09|, 
Earthmover Distance | ADIW09f |7 l\ sampling algorithm [MW10|, distance to independence of two 
random variables [BOlOa], and even, more generically, a characterization of "sketchable" functions 



of frequencies [BOlOc]. While clearly very powerful, the Indyk- Woodruff technique is somewhat 
technically involved, and hence tends to be cumbersome to work with. 

In this paper, we show an alternative design for the Indyk-Woodruff technique, resulting in a 
simplified algorithm for several of the above applications. Our key ingredient, dubbed the Precision 
Sampling Lemma (PSL), is a probabilistic method, concerned with estimating the sum of a number 
of real quantities. The PSL was introduced in [AKOIC, Lemma 3.12], in an unrelated context, of 
query- efficient algorithms (in the sense of property testing) for estimating the edit distance. 

Our overall contribution here is providing a generic approach that leads to simplification and 
unification of a family of data-stream algorithms. Along the way we obtain new and improved 
bounds for some applications. We also give a slightly improved version of the PSL. 

In fact, all our algorithms comprise of the following two simple steps: multiply the stream 
by well-chosen random numbers (given by PSL), and then solve a certain heavy- hitters problem. 
Interestingly, each of the two steps (separately) either has connections to or is a well-studied problem 
in the literature of data streams. Namely, our implementation of the first step is somewhat similar to 
Priority Sampling [ DLT07 ], as discussed in Section [L^. The second step, the heavy-hitters problem, 
is a natural streaming primitive, studied at least since the work of Misra and Gries |MG82(| . It 
would be hard to list all the relevant literature for this problem concisely; instead we refer the 
reader, for example, to the survey by Muthukrishnan [ Mut05|| and the CountMin wiki site [ CM1(| 
and the references therein. 



1.1 Streaming Applications 

We now describe the relevant streaming applications in detail. In most cases, the input is a vector 
x S R n , which we maintain under stream updates. An update has the form (i,5), which means 
that 5 € R is added to x,, the ith coordinate of x. 1 The goal is to maintain a sketch of x of 
small size (much smaller than n), such that, at the end of the stream, the algorithm outputs some 
function of x, depending on the actual problem in mind. Besides the space usage, another important 
complexity measure is the update time — how much time it takes to modify the sketch to reflect 
an update (i, 5). 

We study the following problems. 2 For all these problems, the algorithm is essentially the same 
(see the beginning of Section ||). All space bounds are in terms of words, each having 0(log n) bits. 

Fk moment estimation, for k > 2: The goal is to produce a (1 + e) factor approximation to the 
/c-th moment of X - 1.6. 5 || X ||^ = Y^,i=l \ x i\ k - The nrs * sublinear-space algorithm for k > 2, 

We make a standard discretization assumption that all numbers have a finite precision, and in particular, 8 6 
{—M, —M + 1, . . . , M - 1, M}, for M = n° (1) . 

2 Since we work in the general update framework, we will not be presenting the literature that is concerned with 
restricted types of updates, such as positive updates 5 > 0. 
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due to [AMS99], gave a space bound n l ~ l l k ■ (e -1 logre) ^ 1 ), and further showed the first 
polynomial lower bound for k sufficiently large. A lower bound of n(n 1_2 / fc ) was shown in 
[ CKS03 , BJKS04], and it was (nearly) matched by Indyk and Woodruff HTW05| , who gave an 
algorithm using space re 1_2//fc • (e _1 \ogn)°^ l \ Further research reduced the space bound to 
essentially 0(n 1 ~ 2 / fc • e' 2 ' 4 ^ log 2 n) flBGKS06| , |MW10|1 (see 1MW10H for multi -pass bounds). 
Independently of our work, this bound was improved by a roughly 0(log re) factor in [ |BO10b| ]. 

Our algorithm for this problem appears in Section 3.1, and improves the space usage over 
these bounds. Very recently following the framework introduced here, [Ganll] reports a 
further improvement in space for a certain regime of parameters. 



-norm estimation, for p G [1, 2]: The goal is to produce a 1 + e factor approximation to ||x 
just like in the previous problem. 3 The case p = 2, i.e., i^-norm estimation was solved in 
| AMS9S ], which gives a space bound of 0(e _2 logn). It was later shown in flndOf: ] how to 
estimate £ p norm for all p G (0, 2], using p-stable distributions, in 0(e~ 2 log re) space. Further 
research aimed to get a tight bound and to reduce the update time (for small e) from J7(e~ 2 ) 
to log° (1) re (or even O(l) for p = 2), see, e.g., 1NW10| , [KNWiq , [Ooi| |GC0l and references 



therein. 



Our algorithm for this problem appears in Section fO for p = 1 and Section 4.1 for all 



p G [1,2]. The algorithm has an improved update time, over that of fGCOTll , for p G (1,2] 



and uses comparable space, 0(e p log re). We note that, for p = 1, our space bound is 



worse than that of UNWlOfl . Independently of our work, fast space-optimal algorithms for all 
p G (0,2) were recently obtained in [KNPW11]. 



Mixed/cascaded norms: The input is a matrix i£f 
norm, defined as ||x||„.„ = ( CF.a^„i \xi^\ q ) p / q 



l , and the goal is to estimate the £ p (i q ) 



i/p 



P, q - V E teW (E ieW l^f) P/9 J '",forp,g>0. Introduced in fCMOSg 
this problem generalizes the ^p-norm/i^-moment estimation questions, and for various values 
of p and q, it has particular useful interpretations, see [CM05b| for examples. Perhaps the first 
algorithm, applicable to some regime of parameters, appeared in |GBD08[| . Further progress 



on the problem was accomplished in [JWOS], which obtains near-optimal bounds for a large 
range of values of p,q > (see also fMWlOU and |GBD08|| ). 



We give in Section |4.2| algorithms for all parameters p, q > 0, and obtain bounds that are 
tight up to (e -1 logra) ^ 1 ) factors. In particular, we obtain the first algorithm for the regime 
q > p > 2 — no such (efficient) algorithm was previously known. We show that the space 
complexity is controlled by a metric property, which is a generalization of the p-type constant 
of l q . Our space bounds fall out directly from bounds on this property. 

-sampling, for p G [1, 2]: Here, the goal of the algorithm is to produce an index i G [n] sampled 
from a distribution D x that depends on i, as opposed to producing a fixed function of x. 
In particular, the (idealized) goal is to produce an index i G [n] where each i is returned 



with probability 



We meet this goal in an approximate fashion: there exists some 



approximating distribution D' x on [re], where D' x (i) = (l±e)|xj|/||j;||i±l/re 2 (the exponent 
2 here is arbitrary), such that the algorithm outputs i drawn from the distribution D' x . Note 



3 The difference in notation (p vs. k) is partly due to historical reasons: the £ p norm for p £ [1, 2] has been usually 
studied separately from the moment for k > 2, having generally involved somewhat different techniques and space 
bounds. 
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that the problem would be simple if the stream had only insertions (i.e., 5 > always); so 
the challenge is to be able to support both positive and negative updates to the vector x. 



The £p-sampling problem was introduced in [MW10], where it is shown that the £ p -sampling 



problem is a useful building block for other streaming problems, including cascaded norms, 
heavy hitters, and moment estimation. The algorithm in [MW10] uses space. 



Our algorithm for the ^p-sampling problem, for p £ [1,2], appears in Section |5[ It improves 
the space to 0(e~ p log 3 n). Very recently, following the framework introduced here, [ ISTlOfl 



further improve the space bound to a near-optimal bound, and extend the algorithm to 

pe [0,1]. 

All our algorithms maintain a linear sketch L : M. n — > (i.e. L is a linear function), where S 
is the space bound (in words, or 0(S log n) in bits). Hence, all the updates may be implemented 
using the linearity: L(x + 5ei) = Lx + 5 • Lei, where is the ith standard basis vector. 

1.2 Precision Sampling 

We now describe the key primitive used in all our algorithms, the Precision Sampling Lemma 



(PSL). It has originally appeared in [ AKO10 |. The present version is improved in two respects: it 
has better bounds and is streaming-friendly. 

PSL addresses a variant of the standard sum-estimation problem, where the goal is to estimate 
the sum a = X^i a « °f n unknown quantities a, £ [0,1]. In the standard sampling approach, 
one randomly samples a set of indices I C [n], and uses these a^'s to compute an estimate such 
as jj| ^ig/ a «- Precision sampling considers a different scenario, where the estimation algorithm 
chooses a sequence of precisions Ui £ (0, 1] (without knowing the Oj's), and then obtains a sequence 
of estimates hi that satisfy |dj — Oj| < Ui, and it has to report an estimate for the sum a = ^ i a^. 
As it turns out from applications, producing an estimate with additive error m (for a single ai) 
incurs cost 1/ui, hence the goal is to achieve a good approximation to a while keeping tabs on the 
total cost (total precision) Yli(^-/ U i)- 

To illustrate the concept, consider the case where 10 < a < 20, and one desires a 1.1 multi- 
plicative approximation to a. How should one choose the precisions u{l One approach is to employ 
the aforementioned sampling approach: choose a random set of indices I C [n] and assign to them 
a high precision, say Ui = 1/n, and assign trivial precision ui = 1 to the rest of indices; then 
report the estimate a = tjt X^g/ ®i- This way, the error due to the adversary's response is at most 
iff \ a i ~ a i\ < 1) an d standard sampling (concentration) bounds prescribe setting |/| = O(n). 
The total precision becomes 0(n- \I\) = 0(n 2 ), which is no better than naively setting all precisions 
m = 1/n, which achieves total additive error 1 using total precision n 2 . Note that in the restricted 
case where all a. t < 40/n, the sampling approach is better, because setting |7| = O(l) suffices; 
however, in another restricted case where all aj £ {0, 1}, the naive approach could fare better, if we 
set all Ui = 1/2. Thus, total precision 0(n) is possible in both cases, but by a different method. We 
previously proved in [AKO10] that one can always choose Wi randomly such that ^ wt < O(nlogn) 



with constant probability. 

In this paper, we provide a more efficient version of PSL (see Section ^ for details). To state 
the lemma, we need a definition that accommodates both additive and multiplicative errors. 



'Naturally, in other application, other notions of cost may make more sense, and are worth investigating. 
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Definition 1.1 (Approximator). Let p > and f G [1,2]. A (p, /)-approximator to r > is any 
quantity f satisfying rjf — p < f < fr + p. (Without loss of generality, f > 0.) 

The following lemma is stated in a rather general form. Due to historical reasons, the lemma 
refers to precisions as w% G [l,oo), which is identical to our description above via wi = 1/ui. Upon 
first reading, it may be instructive to consider the special case / = 1, and let p = e > be an 
absolute constant (say 0.1 to match our discussion above). 

Lemma 1.2 (Precision Sampling Lemma). Fix an integer n > 2, a multiplicative error e G 
[1/n, 1/3], and an additive error p G [1/n, 1]. Then there exist a distribution W on the real interval 
[l,oo) and a reconstruction algorithm R, with the following two properties. 

Accuracy: Consider arbitrary a±,...,a n G [0,1] and f G [1,1.5]. Let w\ , . . . , w n be chosen at 
random from W pairwise independently. 5 Then with probability at least 2/3, when algorithm 
R is given {wi\i^\ n ] and {a«}ig[ n ] such that each Oj is an arbitrary (l/u>j, /) -approximator of 
Oj, it produces a > which is a (p, f • e € ) -approximator to a = X^=i a i- 

Cost: There is k = 0(1/ pe 2 ) such that the conditional expectation E wg vv [ w I M] < O(klogn) for 
some event M = M(w) occurring with high probability. For every fixed a € (0, 1), we have 
Etogw [w a ] < 0(k a ). The distribution W = W(k) depends only on k. 

We emphasize that the probability 2/3 above is over the choice of {wi}i^[ n ] an d holds (separately) 
for every fixed setting of {oj}iG[n]- l n the case where R is randomized, the probability 2/3 is also 
over the coins of R. Note also that the precisions Wi are chosen without knowing aj, but the 
estimators a>i are adversarial — each might depend on the entire {ai}jg[ n ] and {w i}i£[ n ], and their 
errors might be correlated. 

In our implementation, it turns out that the reconstruction algorithm uses only fij's which are 
(retrospectively) good approximation to a.; — namely a, 3> 1/wt — hence the adversarial effect is 
limited. For completeness, we also mention that, for k = 1, the distribution W = W(l) is simply 
1/u for a random u € [0,1]. We present the complete proof of the lemma in Section [2]. 

It is natural to ask whether the above lemma is tight. In Section [7], we show a lower bound on 
E^gw [w] in the considered setting, which matches our PSL bound up to a factor of 1/e. We leave 
it as an open question what is the best achievable bound for PSL. 



1.3 Connection to Priority Sampling 

We remark that (our implementation of) Precision Sampling has some similarity to Priority Sam- 



pling [DLT07], which is a scheme for the following problem. 6 We are given a vector x G M" of 
positive weights (coordinates), and we want to maintain a sample of k weights in order to be able 
to estimate sums of weights for an arbitrary subset of coordinates, i.e., X^e/ Xi ^ or arbitrary sets 
I C. [n]. Priority Sampling has been shown to attain an essentially best possible variance for a 



sampling scheme |Sze06[. 



The similarity between the two sampling schemes is the following. In our main approach, 
similarly to the approach in Priority Sampling, we take the vector x G M n , and consider a vector y 
where yi = Xi/v,i, for Uj chosen at random from [0,1]. We are then interested in heavy hitters of the 



5 That is, for all i < j, the pair (w i: Wj) is distributed as W 2 . 

6 The similarity is at the more technical level of applying the PSL in streaming algorithms, hence the foregoing 
discussion actually refers to Sections |^ and ||. 
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vector y (in l\ norm). We obtain these using the CountSketch/CountMin sketch [pCFC02 , CM05a |. 
In Priority Sampling, one similarly extracts a set of k heaviest coordinates of y. However, one 
important difference is that in Priority Sampling the weights (and updates) are positive, thus 
making it possible to use Reservoir sampling-type techniques to obtain the desired heavy hitters. 
In contrast, in our setting the weights (and updates) may be negative, and we need to extract the 
heavy hitters approximately and hence post-process them differently. 

See also [ CDK + 09| and the references therein for streaming- friendly versions of Priority Sam- 
pling and other related sampling procedures. 



2 Proof of the Precision Sampling Lemma 



In this section we prove the Precision Sampling Lemma (Lemma 1.2). Compared to our previous 



version of PSL from [A.K01C], this version has the following improvements: a better bound on 



E m6 vv [w] (hence better total precision), it requires the w^s to be only pairwise independent (hence 
streaming- friendly) , and a slightly simpler construction and analysis via its inverse u = 1/w. We 
also show a lower bound in Section |7| 

The probability distribution W. Fix k = Q/pt 2 for sufficiently large constant £ > 0. The 
distribution W takes a random value w £ [l,oo) as follows: pick i.i.d. samples ui, . . . ,u& from the 
uniform distribution E/(0, 1), and set w = maXj- 6 rw 1/uj. Note that W depends on k only. 

The reconstruction algorithms. The randomized reconstruction algorithm R' gets as input 
{wj}j g [ n ] and {aj}i g [ n i and works as follows. For each i E [n], sample k i.i.d. random variables, Ujj S 
17(0, 1) for j £ [k], conditioned on the event {wi = maxj € [ fc ] 1/mj}. Now define the "indicators" 
Si j G {0,1/ k}, for each i & [n], j & [k], by setting 



S i,j 



def j 1/k if Ujj < dbi/t for t = 4/e; 
1 otherwise. 



Finally, algorithm R' sets s = X)te[»] je[k] Si J an ^ reports a = st as an estimate for a = J2i a i- 
A key observation is that altogether, i.e., when we consider both the coins involved in the choice 
of Wi from W as well as those used by algorithm R' , we can think of Ui t i, . . . , Ui^ as being chosen 
i.i.d. from £7(0,1). Observe also that whenever hi is a (1/wi, /)-approximator to a,, it is also a 
(uij , /)-approximator to dj for all j S [k]. 

We now build a more efficient deterministic algorithm R that performs at least as well as 
R'. Specifically, R does not generate the Ujj's (from the given lUf's), but rather sets Sj = f 

E SjG[fc] Si J I mm ie[fc] u i,j = l/ w i an d s = X^i(=[n] s i- ^ simple calculation yields an explicit for- 
mula, which is easy to compute algorithmically: 

fl + ^-^pi; ifa^A>l 
1 otherwise. 

We proceed to the analysis of this construction. We will first consider the randomized algorithm 
R', and then show that derandomization can only decrease the error. 



Proof of Lemma l.L We first give bounds on the moments of the distribution W. Indeed, recall 



that by definition w = max^g^j We define the event M to be that w < re 5 ; note that Pr[M] > 



5 



1 - k ■ n" 5 > 1 

1— n -5 fn~ 5 x 



ln(n 5 ) 



0{n 2 ). Conditioned on M, each tij G U(n 5 , 1), and we have E 
. Thus 

E wew [w | M] < E fc ie[fc] £ | Ml < O(fclogn). 



Now fix a G (0,1). It is immediate that E[l/t£ a ] = 0(1/(1 — a)). We can similarly prove that 
E^gw I^"] 5; 0{k a / (1— a)), but the calculation is technical, and we include its proof in Appendix [A|. 

We now need to prove that a is an approximator to a, with probability at least 2/3. The plan 
is to first compute the expectation of Sij, for each i G [n],j G [A;]. This expectation depends on the 
approximator values hi, which itself may depend (adversarially) on Wi, so instead we give upper and 
lower bounds on the expectation E [sjj] ~ f-. Then, we wish to apply a concentration bound on 
the sum of Sjj, but again the Sij might depend on the random values Wi, so we actually apply the 
concentration bound on the upper/lower bounds of Sij, and thereby derive bounds ons = ^ Sij. 

Formally, we define random variables Sij,^ G {0, 1/k}. We set Sjj = 1/k iff Ujj < fai/{t — 1), 
and otherwise. Similarly, we set = 1/k iff Uij < a*/ f(t + 1), and otherwise. We now claim 
that 

(1) 



Indeed, if Sij = 1/k then Uij < hi/t, and hence, using the fact that hi is a {u%j , /)-approximator 
to Oj, we have Uj , < /ctj/(i — 1), or s^j = Similarly, if Sj,- = 0, then mj > hi/t, and hence 



u 



ij > a>i/ f(t + 1), or = 0. Notice for later use that each of {sij} and {sjj} is a collection 
of nk pairwise independent random variables. For ease of notation, define a = tJ2 i jS i j an d 
a = t J2i j 8i t j, and observe that g_ < a < a. 

We now bound E [sij] and E [sj •] . For this, it suffices to compute the probability that Sjj and 
Sj j are For the first quantity, we have: 



Pr 



Pr 



where we used the fact that t — 1 > e e / 2 t. Similarly, for the second quantity, we have: 



Pr 



Pr 



a h3 — /(t+1) 



> e 



-e/2 



r i.a 



(2) 



(3) 



Finally, using Eqn. (|l|) and the fact that E [s] = ^ [ s m']> we can bound the expectation and 
variance of a = st as follows: 



e-^f- 1 ■ a < t J2 E [s it:j ] <E[a]<tJ2® foj] < e t/2 f ■ a, 



(4) 



'■7 



'■J 



and, using pairwise independence, Var [a], Var {&] < t 2 ■ Y^ij k~ 2 ■ e e / 2 ■ if < Atk~ l a. Recall that 
we want to bound the probability that g_ and a deviate (additively) from their expectation by 
roughly ea + p, which is larger than their standard deviation 0(V tk _1 a) = 0(^/pea). 

Formally, to bound the quantity a itself, we distinguish two cases. First, consider a > p/e. 
Then for our parameters k = C/pt 2 anci t = 4/e, 



Pr 



a > e e/2 fa- (1 + e/2) 



< Pr 



a-E \&\ > e/2 • e e fa 



< 



Var ct 



(e/2-e e /o-) 2 — e 2 a 2 /4 



< ^ < ^ < 0.1 
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for sufficiently large (. Similarly, Pr[a < / 1 e e / 2 a ■ e e / 2 ] < 0.1. 
Now consider the second case, when a < p/e. Then we have 



Pr 



a > fe e/2 a + p 



< Pr 



a - E [#] > p 



< 



Var ct 



< 4tk );pl e < 0.1. 



Similarly, we have Pr[<r < / _1 e _e ' 2 cr — p] < 0.1. This completes the proof that it is a (p,fe e )- 
approximator to a, with probability at least 2/3. 

Finally, we argue that switching to the deterministic algorithm R only decreases the variances 
without affecting the expectations, and hence the same concentration bounds hold. Formally, 



denote our replacement for Sj by s[ 



E 



<<<. 



EjGffc] Si >3 I maX iG[fc] 1 / U i, 



random variable (because of Wi). Define s • = E 



Wi 

= w 



, and note it is a 
, and by applying 



Y^je[k]Si,j I m axj 6 [ fe ] l/uij 
conditional expectation to Eqn. (|]), we have Sj < s' i . We now wish to bound the variance of Ei^i- 
By the law of total variance, and using the shorthand w = {u)i}i, 



Var ^ = E [Var [£\ ^ | ^] + Var [E E< ^ Ml- 



(5) 



We now do a similar calculation for but since each s- is completely determined from the 

known w, the first summand is just and in the second summand we can change each to Sj, 
formally 



Var Ei ^] = E [Var [£. a< H] + Var [E ^ | ^]] = Var [E ^ | ^]] . 



(6) 



Eqns. (|5|) and @ imply that in the deterministic algorithm the variance (of the upper bound) can 
indeed only decrease. The analysis for the lower bound is analogous, using s^. As before, using the 
fact that the s£ are pairwise independent (because the Wi are) we apply Chebyshev's inequality to 
bound deviation for the algorithm R's actual estimate a = t ^ s[ . □ 



3 Applications I: Warm-Up 

We now describe our streaming algorithms that use the Precision Sampling Lemma (PSL) as the 
core primitive. We first outline two generic procedures that are used by several of our applications. 
The current description leaves some parameters unspecified: they will be fixed by the particular 
applications. These two procedures are also given in pseudo-code as Alg. || and Alg. |2[ 

As previously mentioned, our sketch function is a linear function L : W 1 — > R 5 mapping an 
input vector x G M n into M 5 , where S is the space (in words). The algorithm is simply a fusion 
of PSL with a heavy hitters algorithm [CCFC02, |CM05a| |. We use a parameter p > 1, which one 



should think of as the p in the £ p -norm estimation problem, and p = k in the moment estimation. 
Other parameters are: p £ (0, 1) (additive error), e G (0, 1/3) (multiplicative error), and m G N (a 
factor in the space usage) . 

The sketching algorithm is as follows. We start by initializing a vector of w^s using Lemma lL 



specifically we draw w^s from W = W(k) for k = -Xj. We use I = O(logre) hash tables {Hj}j e m, 
each of size m. For each hash table Hj, choose a random hash function hj : [n] — > [to], and 
Rademacher random variables gj : [n] — > {— 1,+1}. Then the sketch Lx is obtained by repeating 
the following for every hash table j G [I] and index i 6 [n[: hash index i G [n] to find its cell hj(i), 

and add to this cell's contents the quantity gj(i) ■ XiW l J p . Overall, S = Im. 
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The estimation algorithm E proceeds as follows. First normalize the sketch Lx by scaling it 
down by an input parameter r £ M + . Now for each i G [n], compute the median, over the I hash 
tables, of the pth. power of cells where i falls into. Namely, let Xi be the median of \Hj(hj(i))\ p /ruii 
over all j £ [I]. Then run the PSL reconstruction algorithm R on the vectors {xj}i e [ n ] and {wi} i€ [ n ], 
to obtain an estimate a = d~(r). The final output is r • &(r). 

We note that it will always suffice to use pairwise independence for each set of random variables 
{ w i}ii (*)}«> an d {hj(i)}i for each j £ [I]. For instance, it suffices to draw each hash function hj 
from a universal hash family. 

Finally, we remark that, while the reconstruction Alg. || takes time fi(n), one can reduce this 
to time m ■ (e~ 1 logn) ^ 1 ) by using a more refined heavy hitter sketch. We discuss this issue later 
in this section. 

Algorithm 1: Sketching algorithm for norm estimation. Input is a vector x £ K n . Parameters 
p, e, p, and m are specified later. 

1 Generate {wi} i£ [ n ] as prescribed by PSL, using W = W{k) for k = Cp" 1 ^ 2 - 

2 Initialize I = O(logn) hash tables H\, . . . ,Hi, each of size m. For each table Hj, choose a 
random hash function hj : [n] — > [m] and a random gj : [n] — > { — 1, +1}. 

3 for each j £ [I] do 

Multiply x coordinate-wise with the vectors {w]^ P }i^[ n ] an d gj, and hash the resulting 



vector into the hash table Hj. Formally, Hj(z) = J2i-hj(i)=z 



i/p 

W- ■ Xj 



Algorithm 2: Reconstruction algorithm for norm estimation. Input consists of I hash tables 
Hj, precisions Wi for i £ [n], and a real r > 0. Other parameters, p, e, p, m, are as in Alg. |]. 

1 For each i £ [n], compute = median^] ^Jh^hMlM |, 

2 Apply PSL reconstruction algorithm i? to vector (xi, . . . x n ) and (toi, . . . w n ), and let it be 
its output. Explicitly, for each i £ [n], if XiWi > t = 4/e, then set Sj = ^ + • 

(recall = C,p~ l e~ 2 from PSL), otherwise Sj = 0; then, let a = tJ2i s i- 

3 Output r ■ a. 



3.1 Estimating Moments for k > 2 



We now present the algorithm for estimating moments for k > 2, using the PSL Lemma 1.2. 
To reduce the clash of parameters, we refer to the problem as ll F p moment estimation". 

Theorem 3.1. Fix n > 8, p > 2, and < e < 1/3. There is a randomized linear function 
L : W 1 — > II s , with S = 0{n l ~ 2 l p ■ p 2 e~ 2 ~ 4 /p log n), and a deterministic estimation algorithm 
E : — > K, such that for every x £ W 1 ' , with probability at least 0.51, its output E(L(x)) 
approximates \\x\\p within factor 1 + e. 



Proof of Theorem 3.1. Our linear sketch L is Alg. Ill and the estimation algorithm E is Alg. pi with 



the following choice of parameters. Let p = ■ Let W = W(fc), for k = Qp 1 e 2 , be from PSL 



S 



Lemma |i~2|. Define to = 9E we w [u) 2 / p ], and note that to < 0(p 2 / p e 4//p ) by Lemma [L^. Finally 
we set m = a ■ 0(p- 2 /Pe~ 4 /P) so that m > ato, where a = a(p, e) > 1 will be determined later. 

In Alg. 0, we set r to be a factor 1 — 1/p approximation to ||x||2, i.e., (1 — l/p)||x||2 < r < ||x||2- 
Note that such r is easy to compute (with high probability) using, say, the AMS linear sketch 
HAMS99H , with 0(p 2 logn) additional space. Thus, for the rest, we will just assume that ||x||2 G 
[1 — 1/p, 1] and set r = 1. 

The plan is to apply PSL Lemma where each unknown value a, is given by \xi\ p , and each 
estimate a, is given by Xj. For this purpose, we need to prove that the Xj's are good approximators. 

We thus let F 2 = Y%=i{ x i w ] /V f ■ Note that E [^2] = M\l ■ E u,eW [w 2 / p ] < ui/9, and hence by 
Markov's inequality, with probability at least 8/9 we have F2 < to. 

Claim 3.2. Assume that Fi < uj. Then with high probability (say > 1 — 1/n 2 ) over the choice of 
the hash tables, for every i G [n] the value X{ is a (l/wi,e e )- approximator to \xi\ p . 

Proof. We shall prove that for each i G [n] and j G [I] , with probability > 8/9 over the choice of hj 
and gj, the value \ Hj ( h ^ 1 ^ [ s a (1/wi, e e )-approximator to \xi\ p . Recall that each Xi is the median 
of \Hj(hj(i))\ p /wi over I = O(logra) values of j, we get by applying a Chernoff bound that with 
high probability it is a (1/wi, e e )-approximator to \xi\ p . The claim then follows by a union bound 
over all i G [n]. 

Fix i G [n] and j G [I], let Y = Hj(hj(i)). For / G [n], define yj = gj(f) • xjuj 1 ^ if 
hj(f) = hj(i) and otherwise. Then Y = yi + 5 where 5 = J2f^iUf- Ideally, we would like 
that \Y\ P ~ \yi\ p = \xi\ p Wi, i.e., the effect of the error 5 is small. Indeed, E [<5 2 ] = E (^f^Uf) 2 = 

m ^2f^i( x f wl / P ) 2 — Fi/ m - Hence, by Markov's inequality, \8\ < yj 9F2 / m < 3/y/a with probabil- 
ity at least 8/9. 

We now argue that if this event \S\ < 3/\/a occurs, then l^ifeMU = = + <5/w^ p | p 

is a good approximator to \xi\ p . Indeed, if \8\/w\^ p < ^\xi\, then clearly = (1 ± ^) p |xj| p . 
Otherwise, since |<5| < 3/y/a, we have that 



< + \S\) p - \xiwl /p \ p 

< (^| + - (^ 



< |«5r ■ (2p/e) p ■ (Jl + %f - 

< (6p) p ■e 1 - p /^ 2 . 

If we set a = (6p) 2 /e 2 ~ 2 / p , then we obtain that i s a i}-/ w ii e e )-approximator to with 
probability at least 8/9. We now take median over O(logn) hash tables and apply a union bound 
to reach the desired conclusion. □ 



We can now complete the proof of Theorem |3.1[ Apply PSL (Lemma |1.2| ) with <2j = \xi\ p 
and &i = Xi's. By Holder's inequality for p/2 and the normalization r = 1, we have \\x\\p > 
\\x\\ P ,/n p / 2 ~ 1 > p/e, and thus additive error p transforms to multiplicative error 1 + e. It remains 
to bound the space: S < 0(m log n) = 0(ap- 2 / p e~^ p log n) = 0(p 2 /e 2 ~ 2 /P ■ e -6/p n i-2/p . i ogn ) = 

0(p2 n l-2/p. e -2-4/p. logn ^ □ 
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3.2 Estimating l\ Norm 

To further illustrate the use of the Alg. [j] and |, we now show how to use them for estimating the 
li norm. In a later section, we obtain similar results for all £ p , p £ [1,2], except that the analysis 
is more involved. 

We obtain the following theorem. For clarity of presentation, the efficiency (space and runtime 
bounds) are discussed separately below. 

Theorem 3.3. Fix n > 8 and 8/n < e < 1/8. There is a randomized linear function L : W 1 — > IR 5 , 
with S = 0(e -3 log 2 n), and a deterministic estimation algorithm E : IR 5 — > R, such that for every 
x € W 1 , with probability at least 0.51, its output E(L(x)) approximates \\x\\i within factor 1 + e. 

Proof. The sketch function L is given by Alg. [l], with parameters p = 1, p = e/8, and m = Ce~ 3 log n 
for a constant C > defined shortly. Let W = W(k) for k = Qp~ l e~ 2 be obtained from the PSL 



Lemma L2. Define uj = lOE^gw [w \ M], where event M = M(w) satisfies Pr[M] > 1 — 0(n ). 
Note that u < 0(e -3 logn). We set constant C such that m > 3u. 

The estimation procedure is just several invocations of Alg. |2| for different values of r. For the 
time being, assume we hold an overestimate of ||x||i, which we call r > ||x||i. Then algorithm E 
works by applying Alg. ^ with this parameter r. 

Let Fi = Yn=i \xiWi\/r. Note that E [Fx \ HiM^)] = ||x||i/r • E weW [w \ M(w)\ < w/10, and 
hence by Markov's inequality, F± < to < m/3 with probability at least 9/10 — 0(n/n 2 ) > 8/9. Call 
this event £ r , and assume henceforth it indeed occurs. 

To apply the PSL, we need to prove that each xi in Alg. § is a good approximator to X{. Fix 
i € [n] and j G [/]. We claim that, conditioned on £ r , the with probability at least 2/3, ^^^^ is 

a (1/wi, l)-approximator of \ Xi \. Indeed, Hj f^ )] = \gj{i)xi + ^E/^i^tfl^t,) 9j(f) w f x f, and 
thus, 



E 



_ \Xi\ 
rwi r 



— TWi / j ml J J l — mwi — 6Wi 



Hence, by Markov's inequality, ^^^^ is a (l/wi, l)-approximator of |xj|/r with probability 

at least 2/3. By a Chernoff bound, their median Xi = median^] { ^ H } is a (1/^^,1)- 

approximator to \xi\jr with probability at least 1 — n~ 2 . Taking a union bound over all i £ [n] 



and applying the PSL (Lemma 1.2), we obtain that the PSL output, a = a(r) is an (e/8,e e )- 
approximator to ||x||i/r, with probability at least 2/3 — 1/9 — 1/n 2 > 0.6. 

Now, if we had r < 4||x||i, then we would be done as ra would be a (e||x||i/2, e e )-approximator 
to ||x||i, and hence a 1 + 2e multiplicative approximator (and this easily transforms to factor 1 + e 
by suitable scaling of e). Without such a good estimate r, we try all possible values r that are 
powers of 2, from high to low, until we make the right guess. Notice that it is easy to verify that 
the current guess r is sufficiently large that we can safely decrease it. Specifically, if r > 4||x||i 
then ra < e e ||x||i + er/8 < (r/4) • [1 + 3e/2 + e/2] = (1 + 2e)r/4. However, if r < 2||x||i then 
ra > e _e ||j;||i — er/8 > (r/2) • [1 — e — e/4] > (1 + 2e)r/4. We also remark that, while we 
repeat Alg. ^ for O(logn) times (starting from r = suffices), there is no need to increase the 

probability of success as the relevant events £ r = \xiWi\ < rm/3} are nested and contain the 
last one, where r/||x||i € [1,4]. □ 
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3.3 The Running Times 



We now briefly discuss the runtimes of our algorithms: the update time of the sketching Alg. |T|, 
and the reconstruction time of the Alg. ^. 

It is immediate to note that the update time of our sketching algorithm is O(logn): one just 
has to update O(logn) hash tables. We also note that we can compute a particular W{ in O(logra) 
time, which is certainly doable as W{ may be generated directly from the seed used for the pairwise- 
independent distribution. Furthermore, we note that we can sample from the distribution W = 
W{k) in O(l) time (see, e.g., jjofllj). 



Now we turn to the reconstruction time of Alg. ||[ As currently described, this runtime is 
0{n log n). One can improve the runtime by using the CountMin heavy hitters (HH) sketch 
of flCM05£j at the cost of a 0(log(^)) factor increase in the space and update time. This 
improvement is best illustrated in the case of t\ estimation. We construct the new sketch by 
just applying the 0(t/m)-HH sketch (Theorem 5 of |CM05a| ]) to the vector x ■ w (entry-wise 
product). The HH procedure returns at most 0{m/t) coordinates i, together with (l/wi,e e )- 
approximators Xj, for which it is possible that x^Wi > t (note that, if the HH procedure does not 
return some index i, we can consider as being its approximator). This is enough to run the 
estimation procedure E from PSL, which uses only i's for which XiWi > t. Using the bounds from 
flCM05a|1 , we obtain the following guarantees. The total space is 0(e- 1 lognlog(^) • m/t) = 
0(m log ra-log(^)) = 0(e- 3 log 2 n-log(^)). The update time is 0(log n ■ log(^p)) and recon- 
struction time is 0(log 2 n • log(^jp)). 

To obtain a similar improvement in reconstruction time for the i^-moment problem, one uses 
an analogous approach, except that one has to use HH with respect to the £2 norm, instead of the 
l\ norm (considered in |CM05a| ). 



4 Applications II: Bounds via j9-Type Constant 



In this section, we show further applications of the PSL to streaming algorithms. As in Section |3|, 
our sketching algorithm will be linear, following the lines of the generic Alg. [j]. 

An important ingredient for our intended applications will be a variation of the notion of p-type 
of a Banach space (or, more specifically, the p-type constant). This notion will give a bound on the 
space usage of our algorithms, and hence we will bound it in various settings. Below we state the 
simplest such bound, which is a form of the Khintchine inequality. 

Lemma 4.1. Fix p £ [1, 2], n > 1 and x £ M n . Suppose that for each i £ [n] we have two random 
variables, Qi £ { — 1, +1} chosen uniformly at random, and Xi £ {0, 1} chosen to be 1 with probability 
a £ (0, 1) (and otherwise). Then 

in 



E 



Xy j 9iXi x i 



< a||x||p. 



Furthermore, suppose each family of random variables {gi}i and {xi}i is only pairwise indepen- 
dent and the two families are independent of each other. Then, with probability at least 7/9, we 
have that 



< 3 2+p a||x||P. 



The proof of this lemma appears in Section 
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4.1 ^p-norm for p G [1,2] 

We now use Alg. |l| and [2] to estimate the l p norm for p£ [1,2]. We use Lemma 4.1 to bound the 
space usage. 

Theorem 4.2. Fix p E [1,2], n > 6, and < e < 1/8. There is a randomized linear function 
L : R n — > R s , with S = 0{e~ 2 ~ p log 2 n), and a deterministic estimation algorithm E, such that for 
every x E W 1 , with probability at least 0.51, E{L(x)) is a factor 1 + e approximation to \\x\\p. 

Proof. Our sketch function L is given by Alg. gj We set p = e/8. Let W = W(ife) for A; = Cp" 1 ^ 2 
obtained from the PSL (Lemma |1.2j ). Define to = lOE^gyy [w \ M], where event M = M(w) satisfies 
Pr[M] > 1 — 0(n~ 2 ). Note that u) < 0(e~ 3 logn). We set m = aw for a constant a > to be 
determined later. 

We now describe the exact reconstruction procedure, which will be just several invocations of 
the algorithm [2] for different values of r. As in Theorem 3.3, we guess r > starting from the 



highest possible value and halving it each time, until we obtain a good estimate: < r < 4||x|| p 
(alternatively, one could prepare for all possible r's). To simplify the exposition, let us just assume 
in the sequel that r = 1 and thus 1/4 < ||x|| p < 1. 

Let F p = Y%=1 \ x i\ Pw i- Note that e I f p I r\iM(wi)] = \\x\\^, ■ K weW [w \ M(w)\ < u/10, and 
hence by Markov's inequality, F p < co with probability at least 8/9. Call this event £ and as- 
sume henceforth it occurs. To apply PSL, we need to prove that every x , from Alg. § is a good 
approximator to X{. 

Claim 4.3. Assume F p < to and fix i E [n]. If a > 3 2+p e 1 ~ p , then with high probability, Xi is a 
(l/wi,e e )- approximator to \xi\ p . 

Proof. Fix j G [/]; we shall prove that \Hj(hj(i))\ p is a (1,1 + e)-approximator to |xj| p zui, with 
probability at least 2/3. Then we would be done by Chernoff bound, median over 

I = O(logre) independent trials j G [I]. 

For / G [n], define y/ = gj(f) • X{W l J p if hj(f) = hj(i) and yj = otherwise. Define Y = 
Hj(hj(i)) = yi + 5, where 5 = Ylf^i Vf- ^ e a PPly Lemma ^Tl] to conclude that E [\5\ p ] < F p /m, 
and hence \5\ p < 3oj/m < 3/a with probability at least 2/3. Assume henceforth this is indeed the 
case. 

Now we distinguish two cases. First, suppose |xjU>^ p | > | • \5\. Then \Y\ P = (1 ± e/2)\xi\ p wi. 
Otherwise, \xiW 1 J p \ < - ■ \5\, and then 



\Y\ P - \xiw 



1 /pip 



< (\xiW 1 i /p \ + \6\y-\x i w 1 i /p \P 

< \S\ P ■ ((2/e + If - 2/e) 

< W-{2/e) p .(l+pc-l) 

< p2 p ■ 3 • e^P/a. 

Thus, if we set a > 3 2+p (l/e) p_1 , then in both cases \Y\ P is a (1, e e )-approximator to \xi\ p Wi 
(under the event that occurs with probability at least 2/3). □ 



We can now complete the proof of Theorem |4.2j. Applying Lemma 1.2, we obtain that its output 



a = a(r), is a (e/8, e 2e )-approximator to \\x\\ p , with probability at least 2/3— 1/9— 1/n 2 > 0.51. □ 
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4.2 Mixed and cascaded norms 



We now show how to estimate mixed norms such as the £ Piq norms. In the latter case, the input 
is a matrix x G R ni ' n2 , and the l pA norm is ||x|| pg = where is the ith row in the 

matrix. 

We show a more general theorem, for the norm £ p (X), which is defined similarly for a general 
Banach space X; the £ PA norms will be just particular cases. To state the general result, we need 
the following definition. 

Definition 4.4. Fix p > 1, n, k G N, u > 0, <5 G [0,1), and let X be a finite dimensional 
Banach space. The the generalized p-type, denoted a(X,p, n, k, cj, 5), is the biggest constant a > 
satisfying the following: For each i £ [n], let gi G { — 1,+1} be a random variable drawn uniformly 
at random, and let Xi G {0,1} be a random variable that is equal 1 with probability 1/a and 
otherwise. Furthermore, each family {g{\i and {x}i is K-wise independent, and the two families are 
independent of each other. Then, for every x±, . . . x n G X satisfying Ylie\n\ — w > 



Pr 



Z)i e [n] 9iXiXi\\ P x < 1 



> 1 -5. 



Theorem 4.5. Fix p > 1, n > 2, and < e < 1/3. Let X be a Banach space admitting a linear 
sketch Lx '■ X — > M. x , with space Sx = Sx(t), and let Ex '■ M x — >■ R be its reconstruction 
procedure. 

Then there is a randomized linear function L : X n — > R 5 , and an estimation algorithm E 
which, for any x G X n , given the sketch Lx, outputs a factor 1 + e approximation to \\x\\ Py x, with 
probability at least 0.51. 

Furthermore, S < Sx(e/2) • a(X,p, n, k, 0(pe~ 4 log n), 2/3) • 0(log n), where n is such that each 
function gj and hj is n-wise independent. 

We note that the result for l VA norms will follow by proving some particular bounds on the 
parameter a, the generalized p-type. We discuss these implications after the proof of the theorem. 



Proof of Theorem 4-5. Our sketch function L is given by algorithm |l], with one notable modification. 



Xj's are now vectors from X and the hash table cells hold sketches given by sketching function Lx up 
to 1 + e/2 approximation. In particular, each cell of hash table Hj(z) = J2i-h (i)=z 9j(i) ' w l ' LxXi- 
Furthermore, abusing notation, we use the notation for some z G [m] to mean the result 

of the E'-estimation algorithm on the sketch Hj(z) (since it is a 1 + e/2 approximation, we can 
afford such additional multiplicative error). 

We set p = e/8. Let W = W(k) by for k = Cp' 1 ^ 2 obtained fr om the PSL Lemma 11.21. 
Define u) = 10K weW [w \ M], where event M = M(w) satisfies Pr[M] > 1 - 0(n~ 2 ). Note that 
u) < 0(e _3 logn). We set m later. 

We now describe the exact reconstruction procedure, which will be just several invocations of 



the algorithm y for different values of r. As in Theorem 3.3, we guess r starting from high and 
halving it each time, until we obtain a good estimate - < t < 4\\x\\ p ^x (alternatively, one 

could prepare for all possible r's). For simplified exposition, we just assume that 1/4 < ||a;|Lx < 1 
and r = 1 in the rest. 

Let F P:X = E"=i \\ x i wl i /P \\x- Note that E I f p,x I nM(wi)] = ||a;||^ • E weW [w \ M{w)] < u/10, 
and hence F P) x < oj with probability at least 8/9 by Markov's bound. Call this event £. To apply 
PSL, we need to prove that ij's from Alg. are faithful approximators. For this, we prove that, for 
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appropriate choice of a = a(p, X,e,n), for each j E [I], \\Hj(hj(i))\\ p x is a (1, 1 + e)-approximator to 
ll^illx^i) w ith probability at least 2/3. This would imply that, since £i is a median over O(logn) 
independent trials, Xi is a (l/wi,l + e)-approximator to ||xj||^. Once we have such a claim, we 
apply Lemma |j~2| , and conclude that the output, a = <5"(r), is a (e/8, l + 2e)-approximator to 
with probability at least 2/3 — 1/9 — 1/n > 0.51. 

Claim 4.6. Fix p > 1 and co E M+. Let m = a(X,p, k, 3pu/e, 2/3), i/ie generalized p-type of X. 

Assume F p x < w and fix i E E [Z]. T/ien y is a (1,1 + e)- approximator to 

\\ x i\\x w i w tth probability at least 2/3. 

Proof. For / E [n], define y/ = ■ XiW 1 ' p if = hj(i) and = otherwise. Then, 

a = ^2f £ [ n ] :hj ^f) =hj ^ 9j(i)xi = yi + 5, where <5 = Ylf&Vf- Then, by the definition of generalized 
p-type of X, whenever m > a(X,p, k,uj ■ ^2, 2/3), we have that \\5\\x < e/3, with probability at 
least 2/3. 

Now we distinguish two cases. First, suppose ||iCjU;^ p ||x > ■ Then ||a||^ ~ (1 ± 

e)!!^^!^^. Otherwise, if < ■ \\o~\\x, then 

||< < (|| x lW ] /p \\ x + \\5\\ X ) P < (2p\\5\\ x /e + \\5\\ x 7 < \\S\\ P X ■ (2p/e + If < 1. 

Hence, we conclude that ||a||^ (and thus \\Hj(hj(i))\\ p x ) is a (1, 1 + e)-approximator to 

with probability at least 2/3. □ 



The claim conclude the proof of Theorem L5. Note that the space is 5 = 0(Sx(e/2) • 
a(X,p, k, 0(pe~ 4 logn), 2/3) • logn). □ 

We now show the implications of the above theorem. For this, we present the following lemma, 
whose proof is included in Section |(| 

Lemma 4.7. Fix n,m E N, u E R+, and a finite dimensional Banach space X. We have the 
following bounds on the generalized p-type: 

(a) . if0<p<q< 2, then a(£™,p, n, 2, u, 2/3) < 0(cj). 

(b) . ifp,q > 2, we have that a(£™ ,p, n, 2q, u, 2/3) < 9 2 q°^uj 2 /P ■ n 1 " 2 ^, and if q > 2 and 

p£ (0,2), then a(£™,p,n,2q,uj, 2/3) <9 2 q°^u 2 /P. 

(c) . forp> 1, we have that a(X,p, n, 2, uj, 2/3) < 0{n l ~ l l p u) l l' p ), and for p E (0,1), we have that 

a{X,p,n,2,uj,2/3) < 0(u x l*). 



Combining Theorem |4.5| and Lemma |4.7| , also using Theorem [O, we obtain the following linear 
sketches for £ pq norms, which are optimal up to (e -1 logn) ^ 1 ) factors (see, e.g., | JW09fl ). 

Corollary 4.8. There exist linear sketches for (£g 2 ) , for n\,n<i < n and p,q > 1, with the 
following space bounds S. 

For < p < q < 2, the bound is S = (e _1 log n)°^ . 

For q > 2 and p E (0, 2), the bound is S = n\ 2 ' 9 • (pqe^ 1 logri) ^ 1 ). 

For p,q > 2, the bound is S = n\ 2 ^ p n l 2 2 ^ q • (pqe" 1 log n)°^ . 

For p > 1 and q E (0,p), the bound is S = n\ • (e _1 logra) ^ 1 ). 

For p E (0, 1) and q E (0,p), the bound is S = (e" 1 logra) ^ 1 ). 
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5 Applications III: Sampling from the Stream 



We now switch to a streaming application of a different type, ^-sampling, where p G [1,2]. We 
obtain the following theorem. 

Theorem 5.1. Fix n > 2, p 6 [1,2], and < e < 1/3. There is a randomized linear function 
L : R n -»• M s , with S = 0(e~ p log 3 n), and an £i £p- sampling algorithm A" satisfying the following. 
For any non-zero x G ]R n , i/iere is a distribution D x on [n] such that D x (i) is a (n~ 2 ,l + e)- 
approximator to |xj| p /||x||p. Then A generates a pair (i,v) such that i is drawn from D x (using the 
randomness of the function L only), and v is a (0,1 + e)- approximator to \xi\ p . 

In this setting, the sketch algorithm is essentially the Algorithm |] ; with the following minor 
modification. We use k = Qt ■ logn for a sufficiently high Q > 0, and choose m = 0(ke~ p \ogn) = 
0(e _1 ~ p log 2 n) (note that the choice of p is irrelevant as it affects only parameter k, fixed directly). 
Furthermore, the algorithm is made to use limited independence by choosing w^s as follows. Fix k 
seeds for pair-wise independent distribution. Use each seed to generate the list {wi,j}j£[ n ]i where 
each Wij = l/«y for random Uij G £7(0,1). Then Wi = maxj- e rw Wij for each i G [n]. Note that 
each Wi has distribution W = W(k). This method of generating Wi's leads to an update time of 
0(k + logn) = 0(e~ l logn). 

Given the sketch, the sampling algorithm proceeds as described in Alg. ||] (using Wij's defined 
above). We set r to be a 2 approximation to ||x||p, which is easy to compute separately (see, e.g., 



Theorem 4.2). So, below we just assume that 1/2 < | x\\f, < 1 and r = 1. 



Algorithm 3: ^-sampling algorithm. Input consists of I hash tables Hj, precisions Wij for 
i £ [n], j G [k], and a real r > 0. 



Compute Xj = median 



where lOj = max je [ fc ] Wjj. 



2 We compute the following quantities Sij G {0, 1} for i G [n] and j G [k]. For each 
!€ [n],j€ [k], let Sjj = 1 if XiWij >t = 4/e and otherwise. 

3 Let j* be the smallest j G [k] such that there is exactly one i G [n] with s^j* = 1. 

4 If such j* exists, return (i*,Xi* ■ r p /t) where i* is the unique i* with Si*j* = 1. 

5 If no j* exists, return FAIL. 



Proof of Theorem \5. 1\ . Let cj = lOE^gyy I M] = O(klogn), where event M = M(w) satisfies 
Pr[M] > 1 — Q(n~ 2 ). We choose the constant in front of m such that m > aw for a = 3 2+p e 1 ~ p . 

Define F p = 'E ie[n] (x i wl /p )P . Note that E [F p | r\M(wi)] = \\x\\% ■ to/10. Hence F p < u 
with probability at least 9/10 — 0(l/n) > 8/9. By Claim [D| we deduce that Xj is a (l/tt;i,e e )- 
approximator to \x{\ p , with high probability. 

We now prove that the reconstruction algorithm samples an element i with the desired distribu- 
tion. We cannot apply PSL black-box anymore, but we will reuse of the ingredients of PSL below. 
Let aj = \xi\ p G [0, 1], and a, = X{. Note that £\ aj G [1/2, 1]. 

The proof of correctness follows the outlines of the PSL proof. We bound the probability that 
Sij = 1, for i G [n],j G [fc] as follows: 

1/t • aie" 3e/2 < Pr[ai > t/wi + 1/wi] < Pr[s id = 1] < Pr[ai >t/ Wi - 1/vh] <l/t- aie 3e/2 . 
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Hence, for fixed j, Ej* ■ Pr[sjj = 1] < 1/t ■ e 3e / 2 aj < 1/t • e 3e / 2 < e/2. Then, using pairwise 
independence, for fixed i,j, we have that Sij = 1 while all the other Sj/ j = for i' € [n] \ {i} with 
probability that satisfies 

1/t ■ aie~ 3e/2 ■ (1 - e/2) < Pr[s*j = 1A^ s V j = 0] < 1/t • a ie 3e/2 . (7) 

Thus, Ei s ij = 1 with probability at least Sl(e). Furthermore, since the events for different j £ [&:] 
for k = 0(e~ 1 logn) are independent, the algorithm is guaranteed to not fail (i.e., reach step 5) 
with high probability. 

It remains to prove that i* is chosen from some distribution D x , such that D x (i) is a (0(n~ 2 ), 1 + 
0(e))-approximator to |xj| p /||:Ej||p. Indeed, consider j = j*, i.e., condition on the fact that ^ Sij = 
1. Then, 

p r. = = Pr[g^j = 1 Agi^g gj'j =0] 

which, by Eqn. [?], is a (n -2 , e°^ e - ) )-approximator to |xj| p /||x||p (the 0{n~ 2 ) terms comes from con- 
ditioning on event M). Also, note that, since Si*j = 1, we have that Xi is a (0, e°( e ))-approximator 

to \Xi\ P . 

Scaling e appropriately gives the claimed conclusion. 

The space bound is 5 = O(mlogn) = 0(e 1 ~ p klog 2 n) = 0(e _p log 3 n). 

□ 



6 Proofs of j9-type inequalities 



Proof of Lemma ffl|. Let us denote g = (g±, . . . g n ) and x = (xi> • • • Xn)- Since z p l 2 is concave for 
p < 2, a random variable Z > satisfies E [Z p / 2 ] < (EZ) p / 2 , and thus 



E 



Ej SiXi^, 



< E 



A 



Ei flWi^ 



p/2' 



P/2' 



Now using (pairwise) independence of the sequence gi, - ■ ■ ,g n , and the fact that ||z||2 < ||z||p, we 
conclude that 



E 



9-X 



Ei 9iXiX t 



< E 



v 



< E 



v 



Ei(Xi^! 

Ei lxi^l p 

ip 
ip- 



P/2' 



We proceed to prove the lemma's second assertion. Since g and x are independent, for every 
fixed x we have, by Markov's inequality, that with probability at least 8/9 (over the choice of g), 



•2 

^QiXiXi <9^(xiXj) 5 
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Call vector g satisfying the above "good" for the given vector x- We henceforth restrict attention 
only to g that is indeed good (for the relevant which is now a random variable), and we get 



3 P E, 



p/2 



= 3 p a||x||£, 

where the last inequality used again the fact that ||z||2 < \\z\\ p - Now using Markov's inequality over 
the choice of x> with probability at least 8/9 we have \Y1 9iXi x i\ P < 3 2+p a||x||p. The lemma now 
follows by recalling that x an d g are independent (or a union bound). □ 



Proof of Lemma ^.7. For part [a], suppose that < p < q < 2. We note that: 

p/q 



E 



9iXi x i 



E 



^ ] 9iXi x ij 



(8) 



We want to bound cr(x,g) = Ylj \^2i 9iXi x ij\ q ^ f° r fixed vector x and random vector g. For fixed j, 
we have that, using concavity of x q / 2 , pairwise-independence, and norm-inequality respectively: 



E„ 



^ 9iXi x ij 



< 



E„ 



^ ] 9iXi x ij 



q/2 



y ^(Xi x ij) 



q/2 



< ^2xi\xij\ q . 



By linearity of expectation, K g [a(x,g)] < Si Xi 5Zj By Markov's bound, we have that 

f(X)P) < 9 ]T\ Hxi^ill?) with probability at least 8/9 (over the choice of g). Call such g good. 
Plugging this into Eqn. (|8|), since p-norm upper bounds g-norm, we have that: 



E 



9iXi x i 



<9-J2\\Xi xW 



Conditioned on good g, by taking the expectation over x^'s and using Markov's bound, we obtain 
that 



Xi x i 



< # -911x112 



p,q 



with probability at least 8/9 over the choice of %■ Hence, \[%2i giXi x i\\ P q < 1 as long as a = 9 2 ||x||p i( j < 
9 2 lo, with probability at least 7/9 over the choice of g and x- 
For part |(b)|, suppose that q > 2. As before, since 



2/q 



^2diXi 



E 



E»« 
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we want to bound <r(x,<?) = ^2j&j(x>9)i where Uj{xi9) = Et Xi9i x ij\ q - For fixed x, we compute 
the expectation E 9 [<Jj(x,g)]- For this we compute the moment k = 2\q/2] of | Yli 9iXi x ij\- For 
convenience, define yi = Xi x i,j- We have that 



M K 4 E, 



k/2 



Hence, by concavity of /(z) = we have 



q/2 



7 0(9) 



E^ 



9/2 



Thus, we have that a(x,g) < 9<?°^ Sj II Si Xi x ij II s^ 2 w ith probability at least 8/9. Again call 
such g's good. For such a good g, we now have that, by triangle inequality (in norm q/2): 

2 / / x q/2\ 2 /9 



E 



QiXiXi 



^ 9 ?° {1) ■ (e (E(w) 2 ) " j ^ 9 ?° (1) • E n**< 



2 



Conditioned on good g, again by taking expectation over \ and using Markov's bound, we obtain 
that, with probability at least 8/9, 

2 



E 



QiXiXi 



2 _ O(l) . l 



<9-q 



E 



J i \\q- 



Finally, we distinguish the cases where p > 2 and where p £ (0,1). If p > 2, then using that 
Si \\ x i\\q ^ ra 1_2 / p • H^llpo < n 1 ~ 2 / p u} 2 / p , we conclude that, with probability at least 7/9 over <7,%, 
we have that llSi^Xi^ill < 1 as long as a > 9 2 g , °( 1 )n 1 ~ 2 / p u; 2 / p . Similarly, if p G (0,2), then 
Si \\ x i\\q < IMIp,<j < w 2 / p , and we conclude that, with probability at least 7/9 over g,x, we have 
that ||^ 9iXi x i\\ 2 q < 1- We note that we just used «;-wise independence, where k < q + 2. 

We now prove part [c], which just follows from a triangle inequality. Namely, we observe that 



E Xi9i x i 



< 



E Xi n^i 



x ■ 



X 



Hence, taking expectation and applying Markov's bound, we obtain, with probability at least 8/9, 
the following. If p > 1, then 



E Xi9i x i 



<sHu< 



5 H x llp,X - a w > 



V 



and taking a > 9?! 1 1 /P( J j 1 / p is then enough. If p £ (0, 1), then 



^2xi9i 



<%\\x\\ lx <%u> 1 ' p , 

— Q II 111, A — Oi ' 



X 



and taking a > 9w 1 / p is enough. 



□ 
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7 A lower bound on the total precision 

We now deduce a lower bound on ^ E \wj\ , and show it is close to the upper bound that we obtain 



in Lemma 1.2 



Theorem 7.1. Consider the same setting as in the Precision Sampling Lemma (Lemma Li). 
Let {a} be a sequence of numbers in [0,1]. Let {^i}ig[ n ] be a sequence generated by a random 
process, independent of the sequence {ai}. Let R be an algorithm with the following properties. 
The algorithm obtains both {w{\ and a sequence {«i}j6[n]; where each en is an arbitrary (l/u)j,l)- 

approximator to ai. The algorithm outputs a value a that is a (p, e e ) -approximator to a = X^e[n] a * 
with probability at least 2/3. 

Let a = max{p/e, (6e) -4 }. (0, 1/48), and a < n/16 then there exists an absolute positive 

constant C such that ^ Ylie[n] ^-[ w i] > Tp ' ( n / a )- 

Note that our lower bound is essentially off by a factor of e from PSL. 

We now prove the theorem. We start by adapting the lemma that shows that the Hoeffding 
bound is nearly optimal. 

Lemma 7.2 (Based on Theorem 1 of [ CEG95p . Let e £ (0,1/8). Let f be a function from [n] 
to {0, 1}. Let t be a positive integer such that t < yn/3 — 1. Let A be a randomized algorithm 
that always queries the value of f on at most t different inputs, and outputs an estimate a to 

If \a — o~\ < e with probability at least 7/12, then t > C/e 2 , where C is a fixed positive constant. 

Proof. Let 5 be a bound on the probability that the algorithm returns an incorrect estimate. In 
the proof of Theorem 1 in [ CEG95H , it is shown that 

[tySl-i , N / n-t \ 
t\ \\n{l/2+eX\-i) 



± 



i=0 W \\n(l/2+e)V 

For each i e {0, . . . , \t/2] — 1}, we have 

) (n-t)\ \n(l/2 + e)]\ [n(l/2-e)\\ 



^n(l/2+e)1-i 

( rn(1/ n 2+e)1 ) n! ' (Kl/2 + e)l - i)\ ' (Ln(l/2-e)J -t + i)\ 

> n - * • (n(l/2 + e - i/n)) 1 ■ (n(l/2 - e - (t - i + l)/n)) t_i 
= 2~ l • (1 + 2e - ijnf ■ (1 - 2e - (t - i + l)/n)* _i 

> 2~* ■ (1 + 2e - (i + l)/nY ■ (1 - 2e - (t + l)/n) t_i . 



Since e < 1/8, 1 — 2e > 3/4. Since i < \/n/3 — 1, we have (i + l) 2 < n/3 and therefore, 
(t + l)/n < l/(3(t + 1)) < l/(3t). We have ((t + l)/n)/(l - 2c) < 4/(9t). This implies both 

(1 - 2e - (t + l)/n) > (1 - 2e) • (1 - 4/(9*)), 

and 

(1 + 2e - (t + l)/n) > (1 + 2e) • (1 - 4/(9*)). 
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We obtain 



(\n(l/2+e)-]-i) 



> 2-* • (1 + 2e)< • (1 - 2ef~ l ■ (1 - 4/(9t))*. 



(rn(l/2+e)l) 

One can show that for 5 € [0, 1/2], 1 - <5 > e^ 2<5 . Hence 

(1 — 4/(9*))* >e _2 '£"* > 1/4, 

and therefore, 



n—t 



\n(l/2+e)]-i) 
(rn(l/2+e)l) 



> 2~' -2 • (1 + 2e) 1 • (1 - 2e) 



t-i 



We plug this bound into the inequality from [CEG95] and obtain 



5 > 2 



-i-2 



ft/21 -1 

£ 

i=0 



(1 + 2e)* • (1 - 2e) 



t-i 



rt/2i-i 



> 2 -*- 2 • (i + 2e)r*/ 2 i-rV*/2i - (i — 2e) r*/2i+rV*/2i . ^ 

i=\t/2\-\^t/i\ 

> 2-'~ 2 • (1 - 4e 2 f /2] ~ { ^ ■ (1 " 2e) 2 ^l . \^/T/2] 



> 4 . e - 8e ( r * /2] - r v */ 2 i ) . e - 8e rv*/2i 



2* vr*/2i - rvw 



\t/2] ~ \y/t/2] 

V 



Using Stirling's approximation ^/2~^/e fe+1 / 2 e~ fc+1 /( m+1 ) < fc! < y/2~Kk k+1 ^ 2 e~ k+1 ^ 12k \ one can 
show that there is a positive constant C\ such that 

fr / n *r ml >C 1 -2 t /Vi. 
\\t/2] - \y/t/2]J - ' 

Plugging this into the previous inequality, we obtain for some positive constant C2, 

5 > C 2 ■ exp (-8e 2 ([t/2] - \^/tJ2~\) - 8e\y/t/2]} . 

This shows that for very small 5 (namely, for 5 < C2/C3, where C3 is a sufficiently large constant), 
t > C4 • -7 • log(l/<5), where C4 is a positive constant. 

Note that even if 5 is a relatively large constant less than 1/2 (5/12 in our case), t > C5 • -7, 
for some positive C5. This is the case, because if we had a better dependence on e in this case, we 
could obtain a better dependence on e also for small 5 by routinely amplifying the probability of 
success of the algorithm, which incurs an additional multiplicative factor of only O (log (1/5)). This 
finishes the proof. □ 

The above lemma shows a lower bound on the maximum number of queries. In the following 
corollary we extend the bound to the expected number of queries. 
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Corollary 7.3 (Based on Corollary 2 of |CEG95| ). Let e e (0,1/8), and let n > 1/e 4 . Let f 



be a function from [n] to {0, 1}. Let A be a randomized algorithm that outputs an estimate a to 

If \o~ — o~\ < e with probability at least 2/3, then the expected number of queries of A to f is at 
least C/e 2 for some function f , where C is an absolute positive constant. 

Proof. Let t be the maximum expected number of queries of A to /, where the maximum is taken 
over all functions / : [n] — > {0, 1}. Consider an algorithm A' that does the following. It simulates 
A until A attempts to make a (|_12£| + l)-th query. In this case A' interrupts the execution of A, 
and outputs 0. Otherwise A' returns the output of A. The probability that A' returns an incorrect 



answer is bounded by 1/3 + 1/12 = 5/12. By Lemma 7.2, A' makes at least C\/e queries, where 



C\ is a positive constant. Hence 12t > C\/e , which proves the claim. □ 

Finally we show a bound on the expectation of E[u)j]. The bound uses the fact that lOj's have 
to be distributed in such a way that we are able to both observe many small a^s and few large a^s. 
Intuitively, there are roughly ©(log ra) different possible magnitudes of a.;'s, and w^s of different 
size must be used to efficiently observe a sufficiently large number of a^'s of each magnitude. This 
yields an additional logarithmic factor in the lower bound. 



Proof of Theorem 7.1 . Consider the case of a between and p/e. If a is a (p,e e ) estimator for a, 
then 

a ■ e~ e — p < a < a • e £ + p, 
a ■ (1 - e) - p < a < a ■ (1 + 2e) + p, 
a — 2p < a < a + 3p, 
\a — a\ < 3p. 

Therefore, the estimator is also an additive approximation for a. 

Consider an integer j such that (p/e) < 2 J and (6e) -4 < 2 J < n. We create a sequence {aj} as 
follows. Let / be a function from [2- 7 ] — > {0, 1}. We select a subset X C [n] of size 2 3 uniformly 
at random. For each i X, we set Oj = 0. For i G X, we set o$ = (1 + f(k))/2 ■ (p/e)/2 3 , where 
k is the rank of i in X. We have a = ^(1 + 2~ 3 'YLx&pA f( x ))- Therefore, R has to compute an 
additive 3p/(p/(2e)) = 6e approximation to 2~ 3 Ylxe[2i] f( x ) w ith probability at least 2/3, where 
the probability is taken over the random bits of R and the random choice of {wi}. 

We now create a corresponding sequence {aj}. For i X, we set ai = 0. For j 6 1, if 
l/wi < 2 3+i t i we se t ®i = a «' an d a>i = > otherwise. Effectively, R can only see the values f(k) 
for k such that < 2 j+i e > where i is the item of rank k in X. Let Ej be the expected number of 
indexes i for which Wi > 2 ]+1 y The expected number of values of / that R can see is then 2- . Ej. 



By Corollary 7.3, 



-■Ej> C!/(6e) 2 
n 

where C\ is an absolute positive constant. Therefore, 

3 ~ 2H 2 

for another absolute constant C2. 
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Consider now the expectation of the sum of all w^s: 



> 



> 



> 



> 



P 



j€Z H 



#i:wie 



P P 



#i :wi> 



2i +1 e 



E 



j:max{p/e,(6e)- 4 }<2i<n 



j:max{p/e,(6e)- 4 }<2J<n 

C 2 n 



C 3 n 



([lognj - [max{p/e, (6e) 4 } + lj) > log(n/a), 

pe pe 



where C3 is a fixed positive constant. This finishes the proof. 



□ 
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A Bound on E^yy [w a ] 



Claim A.l. For k > 1, suppose Uj are drawn uniformly at random from [0,1]. Then, for any 
a £ (0, 1), we have that 

- / \ oq 

k a 



E, 



( max 1/tt- 
V 3 



1 l—a 
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Proof. We compute the expectation directly: 



E, 



( max 1 / Uj 

\ 3 



vT a ■ k(l - uf- 1 du 



< 



/■l/k rl 

\ k-u~ a dn + k a • k(l - u) k ~ l du 
JO J l/k 



1-a 

k a 



1/k 



(1-u? 



l/k 



< o(^). 



□ 
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